R is a functional programming language. This means that functions are "objects", just like data frames, vectors, and other things that are assigned to variables and passed to other functions.
November 14, 2016
R is a functional programming language. This means that functions are "objects", just like data frames, vectors, and other things that are assigned to variables and passed to other functions.
The name of a functions is actually the name of a variable that contains the function, in the same way that the
log
## function (x, base = exp(1)) .Primitive("log")
This means that we can create a copy of a function by assigning its value to a new variable.
myLogFunction <- log myLogFunction
## function (x, base = exp(1)) .Primitive("log")
myNumber <- 7 class(myNumber)
## [1] "numeric"
class(log)
## [1] "function"
We can create a new function using the word "function" followed by the functions arguments and one or more R statements.
myDumbFunction <- function() 42 myDumbFunction()
## [1] 42
If there is more than one statement in a function, they should be enclosed in curly brackets:
doubleIt <- function(x) {
myResult <- x * 2
myResult
}
doubleIt(5)
## [1] 10
The last statement within the curly brackets will be the value returned by the function.
Inside a function, variables that existed in your environment can be used and even changed. However, any changes made, including changing data stored in variables and creating new variables, happens solely within the function. Your environment stays the same.
exists("myResult")
## [1] FALSE
myResult <- 1000 doubleItOutput <- doubleIt(2) myResult
## [1] 1000
The data set used in today's lecture comes from an siRNA screen that we published a few years ago. The screen looked for genes that influence parkin translocation.
High-content genome-wide RNAi screens identify regulators of parkin upstream of mitophagy. Hasson SA, Kane LA, Yamano K, Huang CH, Sliter DA, Buehler E, Wang C, Heman-Ackah SM, Hessa T, Guha R, Martin SE, Youle RJ. Nature. 2013.
The data set will be available for download from the lectures portion of the class web page.
ambion <- read.csv("nature.parkin.gw.ambion.csv", stringsAsFactors = FALSE)
str(ambion)
## 'data.frame': 65196 obs. of 25 variables: ## $ Vendor.Supplied.Gene.Symbol : chr "LMAN1" "MED15" "PFN2" "KIAA1191" ... ## $ Sample : num 8.91 7.92 7.97 11.2 8.88 ... ## $ Median.Negative.Control.on.Plate : num 38.6 33.3 31.4 43.8 34.3 ... ## $ Median.Positive.Control.on.Plate : num 96.7 96.7 97.1 96.5 96.1 ... ## $ PPT.Sample.as.Percentage.of.Negative.Control : num 23.1 23.7 25.4 25.6 25.9 ... ## $ PPT.MAD.Z.Score : num -2.42 -2.4 -2.36 -2.35 -2.35 ... ## $ PPT.MAD.Log.MAD.Z.Score : num -4.82 -4.74 -4.54 -4.52 -4.49 ... ## $ Median.PPT.Log.Mad.Z.Score.for.all.siRNAs.having.the.same.seed.sequence: num -2.056 -1.421 -1.421 -1.388 -0.485 ... ## $ Number.of.siRNAs.having.the.same.seed.sequence : int 23 152 152 28 18 28 152 28 75 18 ... ## $ Cell.Count..Sample : int 1286 1203 1177 1060 1272 1093 1170 854 1220 1136 ... ## $ Median.Negative.Control.Cell.Count.on.Plate : num 1278 1310 1260 1344 1256 ... ## $ Median.Positive.Control.Cell.Count.on.Plate : num 1030 1050 978 1106 985 ... ## $ Sample.Cell.Count..MAD.Z.Score.Normalized.to.Negative.Contol : num 1.109 0.241 0.399 -1.027 1.168 ... ## $ Sample.1 : num 8.94 16.79 11.3 11.13 12.89 ... ## $ Median.Negative.Control.Mitophagy.on.Plate : num 10.7 28.1 29 15.5 12.1 ... ## $ Median.Positive.Control.Mitophagy.on.Plate : num 22.4 49 50.1 27.7 22 ... ## $ Sample.Mitophagy..MAD.Z.Score.Normalized.to.Negative.Control : num 0.0466 -0.8887 -1.7 -0.4281 0.9416 ... ## $ HUGO.Gene.Symbol : chr "LMAN1" "PCQAP" "PFN2" "KIAA1191" ... ## $ Entrez.GeneID : int 3998 51586 5217 57179 284221 650689 3998 374986 146723 90378 ... ## $ REFSEQ : chr "NM_005570" "NM_001003891" "NM_002628" "NM_001079684" ... ## $ DESCRIPTION : chr "lectin, mannose-binding, 1" "mediator complex subunit 15" "profilin 2" "KIAA1191" ... ## $ PLATE.ID : chr "QDA101771" "QDA101996" "QDA103288" "QDA103435" ... ## $ Row.Number : int 1 15 14 16 1 2 1 16 10 1 ... ## $ Column.Number : int 4 21 19 11 12 1 4 16 1 14 ... ## $ Ambion.siRNA.ID : chr "s8218" "s28366" "s10379" "s226909" ...
apply(ambion, 2, function(x) sum(is.na(x)))
## Vendor.Supplied.Gene.Symbol ## 0 ## Sample ## 441 ## Median.Negative.Control.on.Plate ## 441 ## Median.Positive.Control.on.Plate ## 441 ## PPT.Sample.as.Percentage.of.Negative.Control ## 441 ## PPT.MAD.Z.Score ## 441 ## PPT.MAD.Log.MAD.Z.Score ## 441 ## Median.PPT.Log.Mad.Z.Score.for.all.siRNAs.having.the.same.seed.sequence ## 441 ## Number.of.siRNAs.having.the.same.seed.sequence ## 441 ## Cell.Count..Sample ## 441 ## Median.Negative.Control.Cell.Count.on.Plate ## 441 ## Median.Positive.Control.Cell.Count.on.Plate ## 441 ## Sample.Cell.Count..MAD.Z.Score.Normalized.to.Negative.Contol ## 441 ## Sample.1 ## 441 ## Median.Negative.Control.Mitophagy.on.Plate ## 441 ## Median.Positive.Control.Mitophagy.on.Plate ## 441 ## Sample.Mitophagy..MAD.Z.Score.Normalized.to.Negative.Control ## 441 ## HUGO.Gene.Symbol ## 0 ## Entrez.GeneID ## 441 ## REFSEQ ## 0 ## DESCRIPTION ## 0 ## PLATE.ID ## 0 ## Row.Number ## 441 ## Column.Number ## 441 ## Ambion.siRNA.ID ## 0
ambion[is.na(ambion[,1]),][1,]
## Vendor.Supplied.Gene.Symbol Sample Median.Negative.Control.on.Plate ## NA <NA> NA NA ## Median.Positive.Control.on.Plate ## NA NA ## PPT.Sample.as.Percentage.of.Negative.Control PPT.MAD.Z.Score ## NA NA NA ## PPT.MAD.Log.MAD.Z.Score ## NA NA ## Median.PPT.Log.Mad.Z.Score.for.all.siRNAs.having.the.same.seed.sequence ## NA NA ## Number.of.siRNAs.having.the.same.seed.sequence Cell.Count..Sample ## NA NA NA ## Median.Negative.Control.Cell.Count.on.Plate ## NA NA ## Median.Positive.Control.Cell.Count.on.Plate ## NA NA ## Sample.Cell.Count..MAD.Z.Score.Normalized.to.Negative.Contol Sample.1 ## NA NA NA ## Median.Negative.Control.Mitophagy.on.Plate ## NA NA ## Median.Positive.Control.Mitophagy.on.Plate ## NA NA ## Sample.Mitophagy..MAD.Z.Score.Normalized.to.Negative.Control ## NA NA ## HUGO.Gene.Symbol Entrez.GeneID REFSEQ DESCRIPTION PLATE.ID Row.Number ## NA <NA> NA <NA> <NA> <NA> NA ## Column.Number Ambion.siRNA.ID ## NA NA <NA>
ambion <- ambion[! is.na(ambion[,2]), ] apply(ambion, 2, function(x) sum(is.na(x)))
## Vendor.Supplied.Gene.Symbol ## 0 ## Sample ## 0 ## Median.Negative.Control.on.Plate ## 0 ## Median.Positive.Control.on.Plate ## 0 ## PPT.Sample.as.Percentage.of.Negative.Control ## 0 ## PPT.MAD.Z.Score ## 0 ## PPT.MAD.Log.MAD.Z.Score ## 0 ## Median.PPT.Log.Mad.Z.Score.for.all.siRNAs.having.the.same.seed.sequence ## 0 ## Number.of.siRNAs.having.the.same.seed.sequence ## 0 ## Cell.Count..Sample ## 0 ## Median.Negative.Control.Cell.Count.on.Plate ## 0 ## Median.Positive.Control.Cell.Count.on.Plate ## 0 ## Sample.Cell.Count..MAD.Z.Score.Normalized.to.Negative.Contol ## 0 ## Sample.1 ## 0 ## Median.Negative.Control.Mitophagy.on.Plate ## 0 ## Median.Positive.Control.Mitophagy.on.Plate ## 0 ## Sample.Mitophagy..MAD.Z.Score.Normalized.to.Negative.Control ## 0 ## HUGO.Gene.Symbol ## 0 ## Entrez.GeneID ## 0 ## REFSEQ ## 0 ## DESCRIPTION ## 0 ## PLATE.ID ## 0 ## Row.Number ## 0 ## Column.Number ## 0 ## Ambion.siRNA.ID ## 0
Often it will be helpful to create a new data frame with only the data we wish to analyze.
ambion.simple <- ambion[,c(19,25,1,21,7,13,17)] ambion.simple[1,]
## Entrez.GeneID Ambion.siRNA.ID Vendor.Supplied.Gene.Symbol ## 1 3998 s8218 LMAN1 ## DESCRIPTION PPT.MAD.Log.MAD.Z.Score ## 1 lectin, mannose-binding, 1 -4.82223 ## Sample.Cell.Count..MAD.Z.Score.Normalized.to.Negative.Contol ## 1 1.108518 ## Sample.Mitophagy..MAD.Z.Score.Normalized.to.Negative.Control ## 1 0.04662911
library(knitr, quietly = TRUE)
colnames(ambion.simple) <- c("GeneID","siRNA","Symbol","Description",
"PPT","Cells","Mitophagy")
kable(head(ambion.simple, n=4), format = "markdown")
| GeneID | siRNA | Symbol | Description | PPT | Cells | Mitophagy |
|---|---|---|---|---|---|---|
| 3998 | s8218 | LMAN1 | lectin, mannose-binding, 1 | -4.822230 | 1.1085178 | 0.0466291 |
| 51586 | s28366 | MED15 | mediator complex subunit 15 | -4.739415 | 0.2405872 | -0.8887362 |
| 5217 | s10379 | PFN2 | profilin 2 | -4.539309 | 0.3994161 | -1.7000123 |
| 57179 | s226909 | KIAA1191 | KIAA1191 | -4.516443 | -1.0274124 | -0.4280631 |
library(ggplot2, quietly = TRUE) ggplot(ambion.simple, aes(x=PPT, y=Cells)) + geom_point(alpha=0.5)
ggplot(ambion.simple, aes(x=PPT, y=Cells)) + geom_point(alpha=0.5) +
geom_point(data = ambion.simple[ambion.simple$Symbol == "PARK2",],
color="blue")
ggplot(ambion.simple, aes(x=PPT, y=Cells)) + geom_density2d() +
geom_point(data = ambion.simple[ambion.simple$Symbol == "PARK2",],
color="blue")
ggplot(ambion.simple, aes(x=PPT, y=Cells)) + geom_density2d() +
geom_point(data = ambion.simple[ambion.simple$Symbol == "PARK2",],
color="red", shape=17, size =5) +
ggtitle("PARK2")
description <- ambion.simple$Description[ambion.simple$Symbol == "PARK2"][1] description
## [1] "Parkinson disease (autosomal recessive, juvenile) 2, parkin"
myTitle <- paste("PARK2",description,sep=": ")
myTitle
## [1] "PARK2: Parkinson disease (autosomal recessive, juvenile) 2, parkin"
ggplot(ambion.simple, aes(x=PPT, y=Cells)) + geom_density2d() +
geom_point(data = ambion.simple[ambion.simple$Symbol == "PARK2",],
color="red", shape=17, size =5) +
ggtitle(myTitle)
Now that we have our custom plot looking right, we would like to be able to do the same for other genes but without so much typing. First, make a new R Script in RStudio:
Frequently, making a function will simply be a function of selecting the right parts of your history and hitting the "to source" button.
graphGene <- function(gene) {
description <- ambion.simple$Description[ambion.simple$Symbol == "PARK2"][1]
myTitle <- paste("PARK2",description,sep=": ")
ggplot(ambion.simple, aes(x=PPT, y=Cells)) + geom_density2d() +
geom_point(data = ambion.simple[ambion.simple$Symbol == "PARK2",],
color="red", shape=17, size =5) +
ggtitle(myTitle)
}
graphGene <- function(gene) {
description <- ambion.simple$Description[ambion.simple$Symbol == gene][1]
myTitle <- paste(gene,description,sep=": ")
ggplot(ambion.simple, aes(x=PPT, y=Cells)) + geom_density2d() +
geom_point(data = ambion.simple[ambion.simple$Symbol == gene,],
color="red", shape=17, size =5) +
ggtitle(myTitle)
}
graphGene("PINK1")
pdfGene <- function(gene, file=paste(gene,".pdf", sep="")) {
pdf(file, width=5, height=5)
graphGene(gene)
dev.off()
}
We can use the ellipse notation (…) to indicate that extra arguments to our function should be passed on to a function that is inside our function (in this case pdf).
pdfGene <- function(gene, file=paste(gene,".pdf", sep=""), ...) {
pdf(file, ...)
graphGene(gene)
dev.off()
}
pdfGene("PINK1", width=10, height=10)
## quartz_off_screen ## 2
We can decide whether something happens in our function using "if" and "if/else".
sillyFunction <- function(x) {
if (x < 5) {
returnValue <- x
}
else {
returnValue <- x / 2
}
returnValue
}
sillyFunction(12)
## [1] 6
pdfGenes <- function(genes, file=paste(gene,".pdf", sep=""), ...) {
pdf(file, ...)
for (gene in genes) {
graphGene(gene)
}
dev.off()
}
pdfGenes(c("PLK1","PINK1","BRCA1"))